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Participants in longitudinal studies on the eflects of drug treat- 
ment and criminal justice system interventions are at high risk for in- 
stitutionalization (e.g., spending time in an environment where their 
freedom to use drugs, commit crimes, or engage in risky behavior 
may be circumscribed). Methods used for estimating treatment ef- 
fects in the presence of institutionalization during follow-up can be 
highly sensitive to assumptions that are unlikely to be met in appli- 
cations and thus likely to yield misleading inferences. In this paper 
we consider the use of principal stratification to control for institu- 
tionalization at follow-up. Principal stratification has been suggested 
for similar problems where outcomes are unobservable for samples of 
study participants because of dropout, death or other forms of censor- 
ing. The method identifies principal strata within which causal effects 
are well defined and potentially estimable. We extend the method 
of principal stratification to model institutionalization at follow-up 
and estimate the effect of residential substance abuse treatment ver- 
sus outpatient services in a large scale study of adolescent substance 
abuse treatment programs. Additionally, we discuss practical issues in 
applying the principal stratification model to data. We show via sim- 
ulation studies that the model can only recover true effects provided 
the data meet strenuous demands and that there must be caution 
taken when implementing principal stratification as a technique to 
control for post-treatment confounders such as institutionalization. 



1. Introduction. Each year almost 1.8 million Americans receive alcohol 
and other drug treatment services. Efforts to improve these services through 
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research or provider profiling are hindered by drug treatment chents' high 
rates of post-treatment institutionahzation, defined here as spending a day 
or more in a controlled environment (e.g., a jail, prison, hospital, residential 
treatment or group home setting) where the possibility of drug use and 
criminal activity is substantially diminished. By reducing the potential for 
substance use, institutionalization masks the potential effects of substance 
use treatment programs. For example, outcomes of clients from both effective 
and ineffective treatment programs will look the same when they have no 
access to drugs or alcohol. 

The confounding of treatment effects due to institutionalization cannot be 
ignored when evaluating substance abuse treatment programs, because in- 
stitutionalization is so pervasive among drug treatment clients. For instance, 
in the Drug Abuse Treatment Outcomes Study [Hubbard et al. (1997)], 40% 
of the 2,966 clients in substance abuse treatment programs interviewed 12- 
months after discharge reported being institutionalized for some part of the 
preceding year [U.S. Dept. of Health and Human Services, National Insti- 
tute on Drug Abuse (2004)]. Among those with any institutionalization, 
the average number of days institutionalized out of the past 365 was 115 
[U.S. Dept. of Health and Human Services, National Institute on Drug Abuse 
(2004)]. Similarly, about 52% of the sample from the National Treatment 
Improvement Evaluation (NTIES) [U.S. Dept. of Health and Human Ser- 
vices, Substance Abuse and Mental Health Services Administration, Center 
for Substance Abuse Treatment (2004), NORC and RTI (1997)] were insti- 
tutionalized during the study's 12-month post-treatment evaluation, many 
for the entire length of the evaluation period. 

The confounding effects of institutionalization are not, however, limited 
to substance abuse treatment programs. Criminal justice system clients are 
also at great risk for being institutionalized following an intervention. More- 
over, patients involved in many health studies are at risk for hospitalizations 
which can limit or suppress the measurement of primary outcomes of inter- 
est in those studies (e.g., daily physical activities for a general population 
or falls for geriatric patients). In all of these post-treatment factor 

(e.g., institutionalization or hospitalization) whose value is determined after 
the start of treatment consequently determines the potential range of the 
outcome that can be observed, thereby suppressing or censoring the outcome 
of primary interest. In many situations, inferences are further complicated 
by the fact that the confounding factor can take on many levels giving rise to 
outcomes which are observed at different values of the confounding variable 
and invalidating the comparability of outcomes in the treatment and control 
groups. 

In a recent paper McCaffrey et al. (2007) developed a statistical model 
to describe the different possible estimates of the causal effect of treatment 
in the presence of institutionalization and the potential confounding effects 
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of institutionalization on these estimates. The paper identifies commonly 
used analytic methods for estimating treatment effects in the presence of 
institutionalization and demonstrates that the estimated treatment effects 
can vary greatly depending on which method is employed. The paper also 
identifies assumptions under which the various approaches yield unbiased 
estimates of the treatment effects of interest. Unfortunately many of the 
assumptions required appear unlikely to hold in most real world applications. 

Institutionalization is similar to the problem of incomplete compliance 
in drug trials. In both problems, cases assigned to treatment have multiple 
potential outcomes that can vary depending on a variable not controlled by 
the experimenter: dose of the drug (i.e., level of compliance) in drug tri- 
als or days institutionalized in our example. Hence, methods for estimating 
the dose-response curve from partial compliance data [Efron and Feldman 
(1991)] might apply to the institutionalization problem. However, these 
methods make substantial use of the fact that treatment is a complete dose 
of the drug, whereas, in our example institutionalization is not related to the 
amount or type of treatment received but is an external event that curtails 
use censoring the outcomes of interest. 

The problem of institutionalization is more similar to the problems ad- 
dressed by the methods of principal stratification. Developed by Frangakis 
and Rubin (2002) for problems where outcomes are unobservable for samples 
of study participants because of dropout, death or other forms of censoring 
(e.g., failure to find employment in a jobs program), the method identifies 
principal strata within which causal effects are well defined and potentially 
estimable. Roughly speaking, Frangakis and Rubin (2002) noted that in data 
with censoring, cases that receive treatment and have observed outcomes are 
a mix of cases that would have observed outcomes under control and those 
that would not. Similarly, cases that receive the control condition and have 
observed outcomes are a mix of cases that would have observed outcomes 
under treatment and those that would not. A study participant's censoring 
statuses under the treatment and control conditions define the strata within 
the population, called principal strata. Analyses that condition on cases in 
the stratum where participants would be observed under both treatment and 
control can provide unbiased treatment effect estimates for that stratum as- 
suming no other biasing factors. The key innovation to the work of principal 
stratification is the recognition that conditioning on censoring statuses un- 
der treatment and control is sufficient to allow for unbiased estimation of 
treatment effects. 

Principal strata notions extend to the problem of institutionalization at 
follow-up with only slight modifications to the methods for censored data. In 
particular, modifications are necessary because institutionalization can take 
on more than the two levels of censored and uncensored. Also, outcomes are 
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observed at all levels of institutionalization and treatment effects at various 
levels may be of interest. 

Using data from the Adolescent Treatment Models (ATM) study, fielded 
between 1998 and 2002 by the Substance Abuse and Mental Health Services 
Administration, Center for Substance Abuse Treatment, we aim to exam- 
ine the effects of treatment modality (residential versus outpatient) on the 
12-month substance use outcomes for adolescents who participated in the 
ATM using principal stratification. In the ATM, high rates of institution- 
alization clearly confound the effects of treatment modality. Adolescents in 
residential treatment have significantly higher rates of institutionalization 
and lower mean drug use frequency outcomes at the 12-month follow-up. 
Given that institutionalization in the sample can be shown to yield lower 
drug use outcomes [McCaffrey et al. (2007)], it is unclear whether adoles- 
cents in residential treatment truly have lower mean values of the outcome 
because the treatment is more effective than outpatient treatment or because 
they tend to be institutionalized more often. 

The goal of this paper is to extend principal stratification to control for 
the confounding effects of institutionalization and to evaluate the sensitiv- 
ity of the extended model to various assumptions about the data. Section 
2 describes the data from the ATM study and illustrates the confounding 
effects institutionalization is likely to have when examining treatment ef- 
fects in this study. Section 3 describes the method of principal stratification 
and its extension in more detail and applies the method to the ATM data 
to examine the effects of substance use treatment modality (residential ver- 
sus outpatient) on drug use outcomes for adolescents in the study. Section 
4 evaluates the principal stratification method presented using a series of 
simulation studies and Section 5 provides a discussion of our findings and 
recommendations for practice and future research. 

2. Adolescent treatment models study and the institutionalization con- 
found. The number of adolescents receiving substance abuse treatment has 
increased by over 65% in the last 10 or so years and policy makers, clini- 
cians and parents want to know if treatment is effective and for whom it is 
effective, as well as what treatment modalities are best. We address these 
questions through a study of the effects of treatment modality (residential 
versus outpatient) on the 12-month substance use outcomes for adolescents 
who participated in the ATM study. The ATM study collected treatment ad- 
mission and 12-month outcomes data for new admissions to 10 community- 
based treatment programs in the United States, including six residential 
programs and four outpatient programs [Stevens and Morral (2003)]. 

The sample used in the present analysis includes all new admissions to 
the 10 programs in the main ATM analytic dataset, which was produced 
in March of 2002. Of these 1,384 cases, 1,256 (91%) completed a 12-month 
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follow-up survey and provided data on the outcomes of interest. Only cases 
with follow-up data are included in the analysis presented below. 

For purposes of the illustrations presented in this report, we examine the 
effects of treatment modality on the Substance Frequency Scale (SFS), a 
widely used scale from the Global Appraisal of Individual Needs (GAIN) 
[Dennis (1999)], the survey instrument used at every site for baseline and 
12-month outcome assessments. The SFS averages responses to a series of 
questions on the frequency of recent drug use, intoxication and drug prob- 
lems in the 90 days prior to the 12 month follow-up.^ It is scaled so that 
higher values indicate greater substance use and more drug problems. Days 
of institutionalization at follow-up is assessed with the maximum number 
of days — in the past 90 — which the respondent reports being in any of sev- 
eral different types of controlled environments (e.g., inpatient psychiatric 
or medical hospitals, residential treatment facilities, juvenile halls or other 
criminal justice detention facilities, etc.). 

Figure 1 shows a scatter plot and smooth of mean SFS versus the number 
of months institutionalized for adolescents enrolled in residential and outpa- 
tient care treatment modalities in the ATM. As shown, more adolescents in 
residential treatment have months of institutionalization greater than 0. In 
fact, 52% of adolescents in residential care were institutionalized for at least 
one day in the past 90 at the 12-month follow-up, while only 38% of ado- 
lescents in outpatient care were institutionalized at follow-up (see Table 1). 
Figure 1 also reveals the suppression effect that institutionalization can have 
on SFS. As the number of months institutionalized increases, the observed 
values of SFS in the ATM data appear to decrease. This relationship can 
be seen more clearly in the smooth of SFS versus days institutionalized also 
shown in the figure. 

In addition to the suppression effect of institutionalization. Figure 1 also 
clearly reveals the potential selection effects that exists in this data be- 
tween adolescents with and without any days of institutionalization. The 
mean value of SFS for adolescents who are not institutionalized at follow-up 
(represented by the black X above days institutionalized in Figure 1) is 
markedly lower than the mean values for adolescents with only a few days 
of institutionalization (shown along the smoothed line in Figure 1). The 
difference in mean SFS between these adolescents suggests that adolescents 
who were institutionalized in the 90 days prior to the 12 month follow-up 
tended to be more difficult cases with higher levels of drug use than those 
adolescents not entering institutional settings. 

Both the suppression and selection effects of institutionalization in the 
ATM data could distort inferences about the effects of treatment on SFS. 



In this illustration we multiplied SFS by 90 to make it scale with use in the past 90 
days rather than use per day. 
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Table 1 

Weighted mean rate of institutionalization and SFS at 12-month follow-up 
by treatment modality 





Percent institutionalized at follow-up 


Mean SFS 


Residential treatment 


52% 


9.0 


Outpatient treatment 


38% 


9.7 



Table 1 on page 6 provides weighted descriptive statistics (see below for 
more details on weighting) for SFS and institutionalization at follow-up by 
treatment modality. As shown, adolescents in residential treatment have 
higher rates of institutionalization and lower mean SFS. Given that institu- 
tionalization appears to suppress substance use in this sample, it is unclear 
whether adolescents in residential treatment have lower mean values of SFS 
because they have decreased their substance use in response to residential 
treatment or because they tend to be institutionalized more often. 

Unfortunately, as described in McCaffrey et al. (2007), current methods 
for handling institutionalization are not adequate and require strong as- 
sumptions which are unlikely to hold in practice. In light of these findings, 
we propose the use of principal stratification to obtain policy-relevant treat- 




I 1 1 1 

Months Institionalized 



Fig. 1. Scatter plot of SFS values by number of months institutionalized for adolescents 
in both residential and outpatient treatment modalities with smooth of mean SFS by months 
institutionalized overlayed. The black X denotes the mean SFS value among adolescents 
with days of institutionalization. 
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ment effects on this data which appropriately control for the suppression 
and selection effects institutionalization can have on SFS. 

3. Principal stratification. Principal stratification was developed by Pran- 
gakis and Rubin (2002) as a method for accounting for post-treatment con- 
founds within the context of the Neyman-Rubin causal model. Hence, we 
begin by extending the Neyman-Rubin causal model to account for insti- 
tutionalization and then turn to describe the specific innovations of the 
principal stratification approach. 

3.1. A causal model for treatment effects in the presence of institutional- 
ization. We start by considering the treatment effect of a single intervention 
versus a control in the simple case without any institutionalization. In this 
case, the Neyman-Rubin causal model [Holland (1986), Pearl (1996)] con- 
siders two potential outcomes and one random variable for each case in the 
study. The potential outcomes are Yq, the outcome after receiving the con- 
trol condition, and Yi, the outcome after receiving the treatment condition. 
Throughout we assume that the Stable Unit Treatment Value Assumption 
(SUTVA) [Rubin (1990)] holds so that for each case the potential outcomes 
are unique and well defined. 

The treatment effect for each individual is Yi — Yq- Typically this is sum- 
marized by its mean across study participants. However, we cannot directly 
estimate the treatment effect for individual cases or the mean across cases 
because cases cannot be observed under both the treatment and the control 
conditions. The condition under which each case is observed is determined 
by the random treatment assignment variable T, which equals 1 for treat- 
ment and for control. When T = 1, we observe Yi, otherwise Yq is observed. 
This results in the random variable Y^bs = Yt- 

Unbiased estimation of the average treatment effect is possible if cases 
with T =1 have the same expected values of their potential outcomes as 
cases with T = 0. Under this assumption, £J(Y'obsl^ = 1) = ^O^i) 
-E'(Y'obs|r = 0) = E(Yq), where expectation is over the participating cases. 
Hence, the difference in the observed treatment and control means yields an 
unbiased estimate of the average causal effect of treatment. 

Consistent estimation is also possible in situations where treatment as- 
signment might depend on observable characteristics, such as observational 
studies where study participants self-select into the treatment programs be- 
ing studied. In such circumstances, consistent estimates can be obtained by 
comparing the means for cases with the same probability of treatment and 
using propensity score weights to adjust for selection effects into a particular 
treatment [Rosenbaum and Rubin (1983)]. 

In the presence of institutionalization, the Neyman-Rubin causal model 
described above must be expanded to allow for cases to have potential lev- 
els of institutionalization, which we denote by Zq and Zi for the control 
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and treatment conditions, respectively. The model must also allow for dif- 
ferent potential outcomes at each level of institutionalization. For example, 
a case might have a different potential outcome when institutionalized 
days, compared to 1, 2 or 3 or more days. We let .^max equal the maximum 
possible value for institutionalization. Then, for treatment, T = 1, we la- 
bel the potential outcomes for a case as Yi[z], z = 0, . . . , Z^g^^ so that YifO] 
is the potential outcome if assigned to treatment and not institutionalized 
during the follow-up period and Y\\\\ is the potential outcome if assigned 
to treatment and had 1 day of institutionalization, and so on to .^max- The 
potential outcomes for control are Yq\z\^ z = 0, . . . , .^max- 

Now, for each case, we can define different treatment effects for each 
of the different levels of institutionalization, for example, D\z\ = y\\z\ — 
lol^] such that D\z\ might change with z. While there are multiple causal 
effects that might be of interest, not all may be estimable from the data 
without strong assumptions. We might, for example, be interested in the 
average causal effect of D[0], the average causal effect had no one been 
institutionalized, which McCaffrey et al. (2007) refer to as the "unsuppressed 
treatment effect." Additionally, we might also want to estimate the average 
causal effect for cases at each of the observed values of institutionalization 
as was considered when the notion of principal stratification was derived 
[Prangakis and Rubin (2002)]. We now turn to that concept. 

3.2. Principal stratification. To estimate the causal effects of treatment 
using the notions of principal stratification, we need a model for both the 
outcomes and institutionalization. Specifically, we assume that institution- 
alization can take on only a discrete set of values 0,l,...,Zmax for both 
treatment and control. For example, if the follow-up interval for observing 
outcomes is 90 days, then Zi and Zq can only take on a value from to 90. 

Under this assumption, we can in turn define ^max * -^max principal strata 
as shown in Table 2. We define the probability that an individual falls into 
stratum {zo,zi) by Pzo,zi = Pf{Zo = zq, Z\ = zi} and define a density func- 
tion for the potential outcome ItfztJiZo = zq, Zi = zi if a person falls into 
this stratum by fiy.O-^^^-^-^^t) for zo,zi = l,...,Zmax, where ^^o.^i,* denotes 
the parameter vector for the assumed underlying density function / and t 
denotes whether or not an individual received treatment (t = 1) or control 
(t = 0). We note that the probabilities, Pzq,zi^ do not depend on treatment 
status because Zq and Zi are potential outcomes and a case's principal 
stratum remains the same regardless of which treatment he/she receives. 
Conversely, the distribution of the potential outcomes, /, is determined by 
the principal strata and treatment status of an individual. For example, if 
we can assume the potential outcomes are normally distributed, then we 
might have separate means and variances in 6zQ^zi,t for each possible pair of 
values of {zq,zi) for both treatment and control. 
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Table 2 

Principal stratification model for institutionalization under treatment t = 0, 1 





Zo = 


1 




Zi = 


(poo,/(2/;fo,o,t)) 


(pio,/(2/; 6*1,0,0) 














>x,0,0) 


1 


(poi, f{y; 00,1, t)) 


(pii,/(y; 6*1,1,*)) 


(PZmax,l, 












>x,l,0) 


2 


(po2,/(j/; 6*0,2,*)) 


(pi2,/(2/; 6*1,2, t)) 


(PZmax,2, 












>x,2,0) 








(PZmax,Zm 






/(y;0o,z.„..,t)) 


/(y;0i,z„ex,O) 




r,k_,t)) 



The primary treatment effects of interest in this model are E[D[z]\Zq = z, 
Z\ = that is, the treatment effects for individuals with the same level 
of institutionalization under both treatment and control. These treatment 
effects do not confound changes in institutionalization with other effects of 
treatment and allow policy-makers, stake holders and caregivers to evaluate 
treatment independent of the potentially costly and undesirable effects on 
institutionalization. Each average effect is restricted to cases from a principal 
stratum. Generalizations to other strata require additional assumptions. 

We let Szt,t denote the set of all cases in condition t whose observed value 
of institutionalization equals zt- Then if yi denotes the observed outcome 
for the ith case, the likelihood for the observed data is given by 

)PZQ 

2^1 je5z-^,i ^0 

(3.1) 

n n ^^o,^i,oKo,^i- 

^0 i&SzQ,0 ^1 

This mixture distribution results from the fact that only Zi or Zq is ob- 
served for each case. A similar likelihood was presented for an application 
of principal stratification methods for estimating the causal effect of a jobs 
program [Zhang, Rubin and Mealli (2005)]. In their model, Zmax was 1 and 
the potential outcomes If did not exist when Zt = l. In our case, outcomes 
can exist at every level of institutionalization. 

Weights can be easily incorporated into the likelihood function in equa- 
tion (3.1), allowing for maximization of a weighted likelihood function. A 
natural application of such a weighted likelihood includes the maximization 
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of the likelihood when comparing propensity score weighted clients in one 
treatment program to unweighted clients in another, as will be illustrated 
in Section 3.3 below. 

For many common distributions, mixture likelihoods like (3.1) can be 
optimized using the EM algorithm. Theoretically, we can use the EM algo- 
rithm to estimate causal effects with any set of data. However, in practice, 
the solution is unlikely to be so simple. First, if there are many levels of Z to 
model, there will be many possible levels of institutionalization; then, with- 
out any additional structure, there will be a very large number of parameters 
to estimate. To diminish the dimensionality of the problem, one can con- 
sider modeling Qz(,,zx,t as low order polynomials in a-nd z\. For example, 
we might assume E{Y\\Z\ = zi, Zq = zq) = fi + (3izi + (3qZo + ^yziZQ. Second, 
it is well known that the convergence of likelihood optimizers for mixture 
models can be sensitive to the starting values [Biernacki, Celeux and Gerard 

(2003) , Karhs and Xekalaki (2003), McLachlan (1988), Seidel and Sevcikova 

(2004) ]. Given the large numbers of mixtures involved in this likelihood and 
the fact that identification of the parameters will depend on matching the 
mixing proportions/probabilities across groups, sensitivity to starting val- 
ues is likely and careful consideration of these values will be required. We 
examine these issues more carefully in Section 4. 

3.3. Application of principal stratification to estimating modality effects 
on SFS in the ATM. To tease out the effect of treatment modality on SFS 
in the presence of institutionalization, we fit a four strata principal stratifi- 
cation model to our data from the ATM study, where Zq and Zi can only 
take on two values, namely, and 1. Thus, we dichotomized days institu- 
tionalized such that Zt = a an individual had days of institutionalization 
and Zt = 1 otherwise for t = 0,l. We let t = 1 and = denote residential and 
outpatient care treatment modalities, respectively. This model allows us to 
compute treatment effects among adolescents who would not be institution- 
alized under both treatment and control and among adolescents who would 
be institutionalized under both treatment and control. 

Because SFS has a very skewed distribution with many observed zeros 
(see Figure 1), we assumed the underlying distribution for Y within each 
stratum was tobit [Maddala (1983), e.g., f{y; 0^0,^1,*) = ^(0; r/*„^^, C?)^(^^°) x 

Qiy'iViozi^Ciy^^'^^^ ■> where G{y;ri,('^) and g{y;rjX'^) denote the distribution 
and density functions, resp., for a normal random variable with mean r] and 
variance C^]. The parameters and ^ depend on treatment to allow 

for treatment effects. As parameterized, 'ntozi and are not the mean and 
variance of the tobit distribution but of the truncated normal which defines 
the tobit. 

Because the ATM is an observational study, there were observable differ- 
ences in the pretreatment characteristics of youths entering residential and 
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outpatient care. Given these pretreatment differences, any observed differ- 
ences in treatment group outcomes could result either from differential effec- 
tiveness of the treatment modalities, or because of differences in how hard 
their respective populations are to treat. In order to isolate just the treat- 
ment effects of interest, we must compare treatments on equivalent cases. 
Thus, for the case study, we compare the effectiveness of the two modalities 
on cases like those in the ATM sample who entered the residential modal- 
ity. We achieve this comparison by weighting the outpatient sample so that 
it closely matches the residential sample in terms of the distribution of 86 
pretreatment variables expected to be related to substance use and treat- 
ment assignment [Morral, McCaffrey and Ridgeway (2004)]. Details on the 
weighting and comparison of the weighted groups can be found in McCaffrey 
et al. (2007). 

In the remainder of the analysis reported in this example, we compare the 
unweighted residential sample (n = 770) to the weighted outpatient sam- 
ple (n = 486, effective sample size = 125), a comparison designed to exam- 
ine whether the residential modality produces better 12-month outcomes 
than outpatient care for cases with pretreatment characteristics like those 
of clients who usually enter residential care. We maximize the weighted like- 
lihood in (3.1) using the EM algorithm. Preliminary results suggested that 
estimates could be very sensitive to starting values used with the EM al- 
gorithm. Consequently, we developed the following approach for selecting 
starting values. 

First, among the residential cases observed to have Zi = 0, we fit a 
mixture model of two normal distributions using a simple EM algorithm 
[Dempster, Laird and Rubin (1977)] to obtain initial estimates of the two 
means and their corresponding mixing proportions for this group. We la- 
beled the two means /Xq a P-o B ■ Under our model, we know that the 
observed means for residential cases with Zi = is a mixture of /Xqq and 
, so /Xq ^ is an estimate of one of these means and /ig ^ is an estimate of 
the other; however, we do not know which is which. We thus used starting 
values which allow for both possible mappings in this group. 

We repeated this procedure for residential cases with Zi = 1 and out- 
patient cases for both Zq equal to and 1. These steps, in turn, yielded 
preliminary estimates of the eight mean parameters of the model and their 
associated mixing proportions. All that remained was to determine how the 
eight means mapped to the model parameters. There are 16 possible map- 
pings from the preliminary estimates of the simple mixture models to the 
parameters of the model. For example, one mapping assumes that (/ioA) 
^o,B' /^i,A' ^i,_B' /^A.O' Ms,0' A'-A,!' estimates (/Xq^q, ^Iq ^, ^\ q, fil ^, /Xqq, 
fJ-iQ, fJ-Qi, Ml i)- The remaining 15 mappings are obtained by switching the 
mapping of A and B for each observed value of Zi and Zq to and 1 
exhaustively. 
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To obtain the final parameter estimates of the model, we ran the EM 
algorithm to maximize (3.1) separately for each of the 16 possible mappings 
of the estimates from the simple mixture models and used as final estimates 
the solution that maximized the log likelihood among all 16 runs. Simula- 
tions described below suggested that this procedure can successfully recover 
the global maximum from the many local maxima. 

It is critical that multiple starting values are utilized when applying prin- 
cipal stratification to data. Figure 2 shows the sensitivity of resulting pa- 
rameter estimates from the 16 different starting values used in our analysis. 
The results from each set of starting values are plotted as vertical bands and 
denoted by a number from 1 to 16. Each of the 8 estimated model means is 
denoted by a row and the resulting estimated values for each mean are plot- 
ted for each set of starting values using vertical bars. The first row plots the 
percentage increase in the negative log-likelihood for each solution compared 
to the value that minimized it across all sets of starting values (estimate 11 
in Figure 2). Each set of starting values leads to a different possible solution, 
all of which represent a local minimum of the negative log-likelihood in the 
data. Although the resulting log-likelihood values are very similar (differing 
by no more than 2 percent), each solution gives very different inferences 
about the estimated strata means and treatment effects that exist within 
each stratum. As shown, the solution which gives the minimum negative log 
likelihood value (denoted by estimate 11 in Figure 2) is distinctly different 
even from the next best estimate of the model parameters (estimate 6 in 
Figure 2) which assigns the estimated values of the means for fiQ q and Hq i, 
[iIq and fi\i, /Iqq and /x^o, and /Iqi and /Li? i in reverse of how they are 
assigned in the solution. More generally, it is clear that the alternative solu- 
tions tend to find similar values for the various mean values but "flip" the 
strata labels associated with those labels. 

The mixture of tobit models appears to fit the data well, as shown in 
Figure 3 which plots the histogram of fitted probabilities for each observed 
condition in our data. Goodness of fit can be noted by the grouping of fitted 
probabilities at and 1 for the cases of each observed condition. These 
groupings imply that the majority of our cases have a high probability of 
falling into one of the two strata of which their observed condition is a 
mixture. If there had been a more even spread of these values, we would 
worry about the goodness of fit of the model. Moreover, Table 3 shows 
that the principal stratification model maintains the marginal means and 
probabilities of this model. 

The estimated treatment effects from the mixture model are shown in 
Table 4. In contrast to the patterns shown in Table 1, when we control for 
the confounding effects of institutionalization using principal stratification, 
residential treatment leads to significantly worse outcomes among adoles- 
cents who would experience the same level of institutionalization under both 
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Fig. 2. The values of the negative log-likelihood and the estimated means for each of the 
16 starting values used in the ATM analysis. Each quantity has a separate y-axis and the 
estimated value of that quantity for each set of starting values is denoted by the heights of 
the vertical bars. 
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Fig. 3. Histogram of fitted probabilities for each observed condition. 
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treatment modalities. The effect of residential treatment is larger among 
adolescents who are not institutionalized under both treatment modalities. 

The standard errors for treatment effects within each stratum can be 
adjusted for the effects of clustering (e.g., here adolescents are clustered 
within treatment sites) by using a Huber-White sandwich estimate of the 
variance-covariance matrix for the parameters in the likelihood function 
[Skinner, Holt and Smith (1989)]. When we control for clustering within 
the ATM data, the treatment effects are no longer significant within the 
two strata with the same levels of institutionalization under both treatment 
modalities. 

4. Evaluation of principal stratification. Estimating the treatment ef- 
fects of treatment modality via principal stratification requires identification 
of latent variables from complex mixture data using mixture proportions. In 
particular, means for cases from different principal strata are identified from 
the mixing proportions within observed groups. Given that our exploration 
of starting values suggested that the identification of strata is potentially 
weak and that label swapping might be possible, we felt it important to 
explore the properties of the principal stratification method before inter- 
preting our findings on substance abuse treatment. To our knowledge, there 
have been very few studies using either real or simulated data sets to explore 
the convergence of the estimation algorithms, the identification of desired 
parameters, and the precision of the estimates with varying sample sizes for 
treatment effects from mixture models like model (3.1). 



Table 3 

Observed versus predicted marginal means for residential and outpatient care cases 





Residential 


care 


Outpatient 


care 


Predicted 


Observed 


Predicted 


Observed 


Mean SFS for Z^hs = 


11.8 


9.8 


9.9 


10.2 


Mean SFS for Zobs = 1 


8.1 


8.2 


7.8 


8.1 


Proportion institutionalized 


51% 


52% 


43% 


38% 



Table 4 

Treatment effect estimates and standard errors comparing residential to outpatient 
treatment using the four-strata principal stratification model. Significant treatment effects 
are denoted by * and standard errors are unadjusted for clustering 





Zo=0 


Zo>0 


Zi=0 


4.1 (1.7)* 


1.3 (0.9) 


Zi>0 


-1.1 (0.9) 


3.3 (1.8)* 
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4.1. Methods. To evaluate the method of principal stratification, we be- 
gan by examining the performance of the four strata model where Zq and Zi 
can only take on two values, namely, and 1. Additionally, we assumed that 
the underlying distribution of the outcome, f{y',OzQ^zi,t), within each strata 
is normal with mean fJ-^zozi variance fjf , where the means and variances 
can depend on treatment so that treatment effects can exist. We conducted 
a simulation study of the properties of parameters estimated by maximizing 
(3.1) under this assumed parametric model. 

First, data was simulated under this model and the impact of sample 
size, the value of the principal strata probabilities, and the dispersion of the 
means within each level of t and Zt on model performance was examined. 
Second, data was simulated under heavy tailed and skewed distributions and 
analyzed using the normal model to examine the sensitivity of the normal 
model to model misspecification. In each case, we maximized the likelihood 
using the methods described above and compared the resulting estimates to 
the values used in generating the data. 

4.2. Results. First, we examined the performance of the estimators for 
the four strata model under various assumptions about the sample size per 
treatment arm (N), the strata probabilities and the dispersion of the means 
between strata within the same level of t and Zf. Figures 4, 5 and 6, respec- 
tively, plot the estimated means for the control group, the estimated means 
for the treatment group and the estimated probabilities for each strata versus 
their true values under the various cases considered. As shown, the method 
did not begin to perform well until the means within a given level of t and Zt 
were at least 1.6 standard deviations away from each other and the sample 
size within each treatment arm was at least N = 1000. See supplementary 
material for tabulated results [Griffin, McCaffrey and Morral (2008)]. 

The performance of the estimators was relatively invariant to the true 
strata probabilities unless the probabilities were uniform, as shown in the 
third column of Figures 4 and 5. Specifically, we examined three cases for 
strata probabilities, one in which all four strata had reasonably large prob- 
abilities, another in which one strata had a particularly small probability 
of 0.05, and a third in which the probabilities were equal. Performance 
was similar in the two cases where the probabilities were not all equal, as 
shown in the first and second columns of Figures 4 and 5. When the prob- 
abilities were equal, the method could not identify the correct mapping of 
the mixture components to the principal strata values because the model 
was under-identified. Consequently, the maximum likelihood estimation was 
unable to recover the model parameters. 

We also examined the sensitivity of the four strata model to model mis- 
specification [Griffin, McCaffrey and Morral (2008)]. As expected, the method 
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Fig. 4. Simulation results. Plot of fitted means (black dots) versus true values (indicated 
by dashed lines) for the four strata in the control group with means for other strata (indi- 
cated as dotted gray lines) for N = 100, 1000 and 5000, different dispersions of the means, 
and different assumptions about the strata probabilities. 



was sensitive to extremely heavy tailed and skewed distributions. The algo- 
rithm appeared to perform well when the data had only moderately heavy 
tails or moderate skew. 

Unfortunately, our simulation study results suggest that our results for 
the ATM study must be interpreted with caution. While the dispersion of 
means within each level of institutionalization in the ATM data is 3.04, the 
effective sample size in the weighted control group is quite small, only 125, 
drawing into question the ability of this model to converge to the correct 
solution if the principal stratification model holds in our data. 

Our simulation study analysis was repeated for a nine strata model in 
which Zt can take on three values, namely, 0,1 and 2. However, a number 
of problems were encountered. First, the number of possible sets of starting 
values (e.g., mappings between the parameters of the full model to estimates 
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Fig. 5. Simulation results. Plot of fitted means (black dots) versus true values (indicated 
by dashed lines) for the four strata in the treatment group with means for other strata 
(indicated as dotted gray lines) for N = 100, 1000 and 5000, different dispersions of the 
means, and different assumptions about the strata probabilities. 



from simple mixtme models fit to data) increased significantly, leading to 
219^ possible sets of starting values. Given the computational demands of 
such a large number of starting values, we did not run the EM algorithm for 
all possible mappings. Instead we calculated the likelihood at each possible 
mapping and then tried two techniques for selecting reasonable starting 
values: (i) choosing the 30 starting values which gave the 30 greatest values of 
the log-likelihood and (ii) choosing the 30 starting values which represented a 
spread of the log-likelihood surface. We found that option (i) yielded the best 
solution (i.e., the solution which gave the highest value of the log- likelihood) . 
However, even with a dispersion of means equal to 2.5 and = 5000 per 
treatment arm, the best solution from option (i) was unable to recover the 
model parameters used to simulate the data (results available upon request). 
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Fig. 6. Simulation results. Plot of fitted probabilities (black dots) versus their true values 
(indicated by dashed lines) for the four strata for N = 100, 1000 and 5000, different 
dispersions of the means, and different assumptions about the strata probabilities. 



It is likely that adding more parametric assumptions into the nine strata 
model would help improve the fit of the model. For example, one could 
consider modeling the means within each stratum and treatment group as 
a linear function of the value of Zi and/or Zq. However, as the number of 
strata increases, more assumptions will be required which may or may not 
be likely to hold in practice. Inevitably, extending principal stratification to 
more strata becomes intractable and unidentifiable beyond the simple four 
strata model. 

5. Discussion. Institutionalization during the follow-up poses serious chal- 
lenges to estimating treatment effects on outcomes following substance abuse 
treatment because it restricts opportunities to use drugs or partake in other 
problem behaviors (e.g., criminal activity or risky sexual activity). If unac- 
counted for, it can confound treatment effects and lead to incorrect inferences 
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about the ability of treatment to produce desirable outcomes. Similar prob- 
lems can occur in many other settings, such as criminal justice and mental 
health outcomes studies where study participants are also at high risk for 
institutionalization during the follow-up period. 

Because institutionalization occurs post-treatment, treating it as a co- 
variate in analyses can lead to biased results since cases with various lev- 
els of institutionalization might differ in terms of their potential outcomes 
[McCaffrey et al. (2007)]. Most common approaches to the problem require 
strong assumptions and the results can be very sensitive to which assump- 
tions are made and which methods are used [McCaffrey et al. (2007)]. For 
example, use of a joint or composite outcome which looks at the effects of 
treatment on institutionalization and SFS together conflates the effects that 
treatment has on institutionalization with the effects it has on SFS. Prin- 
cipal stratification allows us to directly model how the treatment effects on 
SFS may vary within different levels of institutionalization. 

Principal stratification provides a framework for developing causal effects 
on outcomes in the presence of post-treatment institutionalization. The key 
idea is that causal effects can be obtained by conditioning on cases with 
equal values for the observed level of institutionalization for the observed 
treatment status and the unobserved level of institutionalization for the un- 
observed treatment status. In situations where institutionalization can take 
on discrete values, the latent principal strata result in finite mixture models 
and the parameters of interest can potentially be estimated. When applied 
to adolescent substance abuse treatment data in the ATM study, the method 
suggests that residential treatment may be less effective for youth who are 
likely to experience the same level of institutionalization under both con- 
ditions (residential and outpatient). However, the effect is strongest among 
those youth who are unlikely to be institutionalized following treatment — 
that is, the least problematic users. Alternative analyses [McCaffrey et al. 
(2007)] found a similar result which improves our confidence in these find- 
ings, despite the challenges of the method cited above. 

As discussed in the Introduction, stakeholders, like parents, want to know 
if treatment is effective and for whom it is effective. The principal stratifica- 
tion approach estimates causal effects for youth in different principal strata. 
Thus, we aim to know if treatment is effective for youth within each principal 
stratum where institutionalization is constant across treatment conditions. 
However, the principal strata do not necessarily provide meaningful classifi- 
cation of adolescents to stakeholders. An important follow-up to our analysis 
is to determine if principal strata membership can be described using mea- 
surable and meaningful baseline variables so that stakeholders might have a 
better idea about which youths will be best served by treatment or different 
treatment modalities. 
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Beyond this application, our investigation of principal stratification sug- 
gests that the potential of principal stratification may be impossible to 
realize in many empirical studies. Our study shows that the differences 
in outcome means among the principal strata must be quite large (1.6 
standard deviation units or larger) and that the sample sizes must also 
be very large {N > 1000) for estimated effects to correctly identify which 
means belong to each principal stratum. Unfortunately, such dispersion of 
means and sample sizes are much greater than those that are likely to 
arise in many applications. With smaller samples sizes and less dispersed 
means, group labels are often switched; for instance, the means of the Zi = 
Zq = stratum might be incorrectly labeled as the means for the Z\ = \ 
and Zq = 0, stratum yielding incorrect inferences about treatment effects. 
Additionally, the switched mean values may be very plausible so that there 
would be no indication of misleading results. The limitations of principal 
stratification must be considered carefully before using the method in all 
but the most ideal settings. 

As the possible values for institutionalization grow, the problem quickly 
becomes more unmanageable. The number of parameters for the principal 
strata and the number of mixtures that need to be identified grow rapidly; 
even finding starting values becomes an extremely challenging task as the 
possible values for institutionalization grow. Estimation in these contexts 
will require additional assumptions about the relationship between outcome 
and institutionalization (e.g., outcomes decline linearly with increased insti- 
tutionalization) and assumptions about the joint distribution of institution- 
alization under treatment and control to make the problem more tractable. 

Given the challenges of principal stratification when we consider many val- 
ues for institutionalization, a possible option may be to ignore the outcomes 
from institutionalized cases and try to estimate an unsuppressed treatment 
effect restricted to cases that would not be institutionalized under either 
treatment or control. Our approach to starting values could be used and 
trustworthy estimates might be obtained in some real world settings. The 
limitation of this approach would be the inability to generalize the estimates 
beyond cases that are likely never to be institutionalized. That is, we could 
not estimate the unsuppressed effect for all cases nor could we determine 
treatment effects on institutionalized youth. 

Using a Bayesian approach might address some of the computational prob- 
lems with maximizing the likelihood since integrating over a prior helps 
to reduce sensitivity to the starting values. However, special care would 
be needed when using Markov chain Monte Carlo methods to sample the 
posterior to avoid label switching of the mixtures within the observed val- 
ues of institutionalization [Jasra, Holmes and Stephens (2005)]. Strata label 
switching will occur because the data provide only very indirect informa- 
tion about the joint distribution of potential institutionalization. Use of an 
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informative prior would be one means of overcoming this limitation of the 
data via Bayesian analysis. However, this approach is unappealing because 
analysts are unlikely to be able to make good informed guesses about the 
joint distribution so that informed priors would be difficult to choose. 

Obtaining sufficient information about the joint distribution of the poten- 
tial institutionalization outcomes is a clear challenge to using the framework 
of principal stratification to provide useful estimates of treatment effects. 
One approach to obtaining more information about this distribution would 
be to collect data at baseline that might be strongly related to the princi- 
pal strata and use the information as covariates in the model for the strata 
probabilities. For institutionalization following substance use treatment this 
information might include criminal activity, detailed information about in- 
volvement with the criminal justice system, history with substance abuse 
treatment, the availability of various types of substance about treatment 
and sources of payment for the treatment. 

Another approach for increasing the information about the principal strata 
is to jointly model multiple outcomes such as multiple indicators of sub- 
stance use and criminal activity. Because principal strata are defined by 
institutionalization and not by other outcomes, principal strata designation 
should not depend on which outcome is modeled. Combining multiple out- 
comes can therefore provide more information about the latent principal 
strata membership and should improve estimation of treatment effects for 
every outcome. A downside of this approach is the necessity of specifying the 
joint distribution for multiple outcomes, but the additional modeling might 
be very beneficial for identifying the principal strata and the accuracy of the 
resulting treatment effect estimates. 

We might consider simply using the principal stratification framework for 
sensitivity analyses. For example, rather than trying to model the joint dis- 
tribution of Zq and Zi , we might specify the joint distribution in terms of 
the parameters of the marginal distributions of Zq and Zi and a parameter 
specifying their correlation. The value of the correlation parameter could 
be manipulated to study the sensitivity of the treatment effect estimates to 
different assumptions about the principal strata. For example, when institu- 
tionalization can take on just two values, we could specify the correlation by 
the interaction from a log-linear model. For continuous institutionalization, 
we could specify the correlation parameter. 

Principal stratification is an important tool for approaching the problem 
of institutionalization during the follow-up, because it provides a framework 
for defining estimates of interest and possible methods for estimating them. 
However, it is also clear that it is not a panacea to the problem because 
the model parameters are at best weakly identified and it does not extend 
easily to problems with many possible values for institutionalization. The 
extensions described above provide useful areas for future research that may 
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significantly improve the application of principal stratification in real world 
settings. 
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SUPPLEMENTARY MATERIAL 

Supplementary tables for "An application of principal stratification to 
control for institutionalization at follow-up in studies of substance abuse 
treatment programs" (DOL 10.1214/08-AOAS179SUPPA; .pdf). This file 
contains tabulated results to simulation study of principal stratification 
method. 

Example data for running principal stratification model in "An 
application of principal stratification to control for institutionalization at 
follow-up in studies of substance abuse treatment programs" 

(DOL 10.1214/08-AOAS179SUPPB; .csv). This file contains dataset de- 
scribed in paper. 

Example code for running principal stratification model in "An 
application of principal stratification to control for institutionalization at 
follow-up in studies of substance abuse treatment programs" 

(DOL 10.1214/08-AOAS179SUPPC; .txt). This file contains code used to 
run models in paper. 
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